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Tell  me  whereon  the  likelihood  depends. 

Wm.  Shakespeare 
As  You  Like  It 
Act  1.  Scene  3.  56. 

Life  is  the  art  of  drawing  sufficient 
conclusions  from  insufficient  premises. 

Samuel  Butler 
Notebooks 


1.  INTRODUCTION 

It  is  a  great  honor  to  present  a  lecture  named  after  Sir  R.A.  Fisher,  especially  at  a 
session  of  the  International  Statistical  Institute,  an  organization  on  whose  behalf  he 
expended  so  much  energy.  Fisher  was  one  of  the  most  productive  and  original 
statisticians  of  this  century,  and  much  of  modern  statistical  theory  and  methods  has  its 
origins  in  his  work.  This  is  especially  true  of  the  curreni  methods  for  the  analysis  of 
categorical  data  via  loglinear  models,  the  topic  of  my  lecture. 
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The  work  I  shall  describe  has  as  its  foundation  Fisher's  notions  of  "likelihood"  and 
"sufficiency."  and  the  general  theory  for  loglinear  models  is  intimately  linked  to  results 
for  exponential  families,  that  are  implicit  in  some  of  Fisher  s  most  profound  theoretical 
papers.  Amongst  Fisher's  contributions  to  statistical  methodology  are  several  papers  on 
contingency  table  analysis  and  the  distribution  of  chi-square  statistics  (sec  Fienberg. 
1980a  for  a  discussion  of  this  work).  These,  along  with  Fisher  s  observations  in  other 
papers  and  suggestions  by  Fisher  to  his  colleagues,  serve  as  the  precursors  to  the  more 
general  results  that  have  been  the  focus  of  attention  in  recent  years. 


Fisher  was  not  simply  a  great  statistician,  he  was  also  a  great  scientist.  And  iic 
worked  hard  at  translating  his  theoretical  statistical  results  into  practical  methods,  of  use 
to  biologists  and  agricultural  scientists  with  whom  he  worked.  For  example,  it  was  for 
them  that  Fisher  wrote  Statistical  Methods  for  Research  Workers,  a  book  that  has 
served  as  a  statistical  bible  for  statisticians  and  non-siatisticians  alike,  since  it  was  firsi 
published  in  1925.  Thus,  in  the  spirit  of  Fisher's  own  work.  1  shall  discuss  not  only 
the  basic  statistical  theory  for  the  analysis  of  categorical  data  using  loglinear  models, 
but  also  the  implications  of  this  theory  for  general  statistical  practice  in  the  reporting 
of  tabular  materials,  and  some  of  the  exciting  new  substantive  areas  where  the  theory  is 
currently  being  put  to  practice. 


Sir  R.A.  Fisher  was  elected  a  member  of  the  International  Statistical  Institute  in  19Ji. 

Beginning  at  the  end  of  World  War  11.  he  worked  with  Stuart  A.  Rice  to  revitalize  and 

reorganize  the  Institute,  which  had  been  dominated  up  to  that  time  by  Europeans  and 

by  government  statisticians.  Over  the  ncxi  11  years.  Fisher  struggled  to  open  up  the 

IS1  membership  to  research  statisticians  and  to  integrate  their  activities  with  those 

statisticians  of  other  persuasions.  In  her  biography  of  Fisher,  his  daughter  (Box.  1978) 

chronicles  these  activities,  and  quotes  from  a  letter  he  wrote  in  1956.  as  follows: 

We  really  have  a  terrifically  long  way  to  go  in  making  the  Institute  as  useful 
as  it  could  be.  since  I  think  the  great  ma.ionty  of  our  foreign  membership 
quite  take  it  for  granted  that  it  is  primarily  an  assembly  of  officials 
concerned  with  national  statistics,  vital  and  economic,  and  of  their  more 
academic  economic  advisers.  These  people  cannot  deny  the  importance  of 
mathematical  statistics  .  .  .  and  if  we  put  in  undeniably  good  mathematicians 
who  insist  on  talking  of  the  natural  sciences  and  in  terms  of  scientific 
research  and  holding  sessions  relevant  to  the  applications  of  mathematical 
statistics  to  scientific  research,  we  have  done  a  pretty  good  generation's  work. 

Box  (1978.  p.433). 

Fisher  was  not  completely  successful  in  these  attempts,  but  he  continued  to  work  on  1S1 
activities,  and  participate  in  its  meetings.  In  recognition  of  his  many  contributions,  the 
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Institute  elected  Fishci  as  an  honorary  member  in  1950.  and  along  with  P.C. 

Mahalanobis  as  Honorary  President  in  195"  (only  two  others  had  been  previously  so 
honored).  Even  in  his  "retirement.''  Fisher  travelled  to  Japan  to  attend  the  1960  IS1 
meetings,  and  to  Pans  to  attend  the  1961  meetings,  the  last  ones  held  before  his  death 
in  1962. 

The  next  section  outlines  the  statistical  theory  for  loglinear  models  in  the  analysis  of 
categorical  data,  and  links  it  to  the  more  general  theory  of  exponential  families.  We 
focus  there  on  maximum  likelihood  estimation,  its  use  of  minimal  sufficient  statistics, 
and  methods  for  assessing  the  goodness  of  fit  of  a  model.  Section  3  briefly  describes 
the  application  of  loglinear  model  methods  for  the  analysis  of  multi-dimensional 
contingency  tables,  and  then  takes  the  form  of  an  aside.  In  it  we  discuss  the 

implications  of  loglinear  model  theory  for  the  reporting  of  results  from  large-scale 

government  sample  surveys,  especially  in  the  form  of  tables  of  cross-classified  counts. 
In  Section  4.  we  turn  to  the  applications  of  the  results  of  Section  2  to  "non- 

contingency  table”  problems  in  (a)  the  Bradley-Terry  paired  comparisons  model,  (b)  the 
analysis  of  social  networks,  and  (c)  the  use  of  the  Rasch  model  in  intelligence  testing 
and  its  potential  for  innovative  survey  analysis.  In  each  case,  the  non -contingency  table 
problem  is  transformed  and  is  re-represented  as  a  problem  in  contingency-table  form, 
whose  solution  has  been  studied  previously. 

Much  of  modern  statistical  practice  relies  heavily  on  the  computational 
implementation  of  methodology.  In  Section  5  of  this  paper,  we  briefly  summarize  the 
state  of  the  art  of  computation  for  loglinear  model  methods,  and  mention  some  topics 
of  current  research  activity  that  may  allow  these  methods  to  be  of  greater  practical  use 
in  the  future. 

2.  LOGLINEAR  MODELS  AND  EXPONENTIAL  FAMILY  THEORY 

The  analysis  of  categorical  data,  focuses  on  the  fitting  of  models  to  collections  of 
counts,  often  fashioned  into  the  format  of  cross-classifications  or  contingency  tables. 
For  purposes  of  describing  the  loglinear  model  approach  to  such  analyses  we  will 
consider  a  vector  of  observed  counts  failing  into  t  cells. 

x  =  (x  .  x . x ). 

i  ’  i 


(2.J) 


These  counts  arc  realizations  of  a  set  of  random  variables 


Then  if  the  counts  in  these  sets  are  observations  from  r  independent  multinomial 
distributions,  the  sums 

n  =  I  i  X  k  =  1.2 . r. 

L  i  *  ->  i 

"  (2.9) 

are  fixed  by  design.  The  probability  density  or  likelihood  function  for  this  general 
situation  is 


\'J  - 

\  n  .  , 


subject  to  the  constraints 

Z  i  m  =  n  for  k  =  1.2 — r. 

'  k  '  k 

(2.11) 

Each  of  the  constraints  in  (2.11)  can  be  characterized  by  a  vector  whose  components 
are  1  if  i  <  and  0  otherwise. 

When  r  =  1.  we  have  observations  from  a  single  multinomial.  When  r  =  2  and  t  = 
4.  we  have  observations  from  two  binomials.  Thus,  the  product-multinomial  includes 
two  of  the  most  widely  used  sampling  models  for  the  2x2  table,  i.e.  the  two-binomial 
model,  and  the  single  four-cell  multinomial  model. 

Both  the  Poisson  and  product-multinomial  sampling  models,  are  special  cases  of  the 
exponential  family  of  distributions,  introduced  first  by  Fisher  in  his  1934  invited  address 
to  the  Royal  Statistical  Society  (Fisher.  1935).  and  elaborated  upon  by  Darmois. 
Koopman.  and  Pitman.  The  general  form  of  the  exponential  family  density  (e.g.  see 
Andersen.  1980  or  Barndorff-Nielsen.  1978)  is 

f(t  .t . t  |  H  . 6  )  =  [c(fl  J  . V  )]•"  expiF  0  t  1  h(i  .t . l  )  . 

(2.12) 

Both  (2.7)  and  (2.10)  can  be  written  in  this  form,  with  t  =  x  and  ff  =  A  .  although 

It  it 

(2.10)  is  subject  to  the  constraints  (2.11)  leading  to  the  use  of  adjusted  0  's  based  on 

the  differences  of  X 's  (for  details,  see  Andersen.  1980.  pp.  20-27).  Exponential  family 

) 

theory  suggests  that  the  log-expectations  X  should  be  the  key  parameters  of  interest. 
By  reexpressing  the  X  -s  as  linear  functions  of  a  reduced  number  of  parameters,  we 
arrive  at  the  notion  of  loglinear  models  for  the  two  basic  sampling  models. 


A  well-known  result  in  basic  probability,  exploited  by  Fisher  in  much  of  his  work  on 
categorical  data  problems,  links  the  Poisson  and  product-multinomial  models: 
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RESULT  7.  Suppose  that  X  follows  the  Poisson  sampling  model.  Then  the 
conditional  distribution  of  X.  given  the  restrictions  (2.9).  is  that  of  the  product 
multinomial  in  (2.10). 


To  specify  a  class  of  loglinear  models,  for  the  vector  of  expectations,  m.  we  need  to 
specify  a  linear  subspace  of  the  t-dimcnsional  space  in  which  the  vector  of 
logcxpectations.  X.  lies.  Call  this  subspace  M  (for  model!).  Thus  we  can  represent  the 
components  of  X  as  linear  combinations  x  =  of  newly  defined  parameters  V.  and 
we  preserve  the  exponential  family  structure  of  (2.12).  We  now  turn  to  the  problem  of 
maximum  likelihood  estimation  of  the  loglinear  parameters  6.  and  of  X  =  \(H)  itself. 


The  following  general  results  on  maximum  likelihood  estimation  for  were  originally 
developed  by  Birch  (1963).  ar.d  later  extended  by  Bishop  (1969).  Haberman  (1974).  and 
others.  They  turn  out  to  be  special  cases  of  more  general  results  for  exponential 
families  as  has  been  noted  by  Dempster  (1971)  and  others. 


RESULT  2.  Corresponding  to  each  parameter  in  H  there  is  a  minimal  sufficient 
statistic  that  is  expressible  as  a  linear  combination  of  the  (x  ).  (More  formally,  if 
M  is  used  to  denote  the  loglinear  model  specified  by  m  =  m (tf).  then  the  MSS  s 
are  given  by  the  projection  of  x  onto  M.  i.e.  P^x.) 


RESULT  3.  The  maximum  likelihood  estimate  under  the  Poisson  model,  in  of  m 
=  exp  X(f7).  if  it  exists,  is  unique  and  satisfies  the  likelihood  equations: 

PMm=  PMx  . 


(2.13) 


i.e.  the  MLE  is  found  by  setting  the  minimal  sufficient  statistics  equal  to  their 
expectations. 


We  note  that  the  MLE  0  of  B  is  defined  implicitly  via  the  MLE  m  of  m  =  exp 
X(i9)  in  expression  (2.13).  In  the  statement  of  Result  3.  we  assume  that  m  exists. 
Necessary  and  sufficient  conditions  for  the  existence  of  MLE's  are  relatively  complex, 
and  we  refer  the  interested  reader  to  Haberman  (1974)  for  details. 


For  product-multinomial  sampling  situations,  the  basic  multinomial  constraints  (i.e.. 
that  the  counts  must  add  up  to  the  multinomial  sample  sizes)  must  be  taken  into 
account.  Thus  we  need  to  ensure  that  the  constraints  (2.11)  are  in  fact  satisfied.  To 
do  so.  we  let  M*  be  a  loglinear  model  for  m  under  product-multinomial  sampling  which 
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corresponds  to  a  loghnear  model  M  under  Poisson  sampling,  such  that  the  multinomial 
constraints.  (2.11)  "fix"  a  subset  of  the  parameters.  (*.  used  to  specify  W.  Then 

RESULT  4  The  MLE  of  m  under  product-multinomial  sampling  for  the  model 
M»  is  the  same  as  the  MLE  of  m  under  Poisson  sampling  for  the  model  M. 

Result  4  follows  directly  from  Results  1.  2.  and  3.  and  forms  the  basis  of  the  unified 
approach  to  loghnear  model  problems,  with  and  without  multinomial  constraints,  as 
described  in  Bishop,  Fienberg.  and  Holland  ( 1Q75).  Woolson  and  Brier  (1981)  show  that 
a  similar  result  holds  for  estimates  of  ni  (and  thus  derived  using  the  weighted  least 
squares  approach  of  Grizzle.  Starmer.  and  Koch  (19b9).  The  key  to  the  result  in  both 
cases  is  the  loghnear  structure  of  the  parametric  model,  and  the  exponential  family 
representation  of  the  sampling  model. 


It  is  interesting  to  note  that  Fisher  implicitly  exploited  Result  4  in  his  discussion  of 
the  degrees  of  freedom  of  the  Pearson  chi-square  statistic  for  2xN  contingency  tables 
(Fisher.  1922b).  The  generalization  of  Fisher's  formulation  of  the  chi-square  problem 
has  led  to  the  following  well-known  theorem. 


RESULT  5.  If  m  is  the  MLE  of  m  under  a  loghnear  modei.  and  if  the  model  is 
correct,  then  the  statistics 

X:  =  S'  (x  -m  );/m 

til  i 


(2.14) 


and 


G;  =  2  S  x  log  (x  /m  ) 

ill  C 


(2.15) 


have  asymptotic  distributions  with  t-s  degrees  of  freedom,  where  s  is  the  total 
number  of  independent  constraints  implied  by  the  loghnear  model  and  the 
multinomial  sampling  constraints.  (2.11)  (if  any).  If  the  model  is  not  correct  then 
X:  and  G:.  in  (2.14)  and  (2.15).  arc  stochastically  larger  than 


In  Result  5.  X;  is  the  usual  Pearson  A-  statistic  for  testing  goodness  of  fit.  and  G;  is 
minus  twice  the  loglikelihood  ratio  comparing  the  restricted  model  m  =  exp  A(i9)  to  the 
unrestricted  model.  Fisher  (1922a)  had  noted  the  asymptotic  equivalence  of  X:  and  G: 
in  certain  situations,  and  suggested  that  the  Pearson  statistic  X:  achieved  its  validity 
because  it  is  an  approximation  to  the  loglikelihood  ratio  statistic. 
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3.  LOGLINEAR  MODELS.  MARGINAL  TOTALS.  AND  THE  REPORTING  OF 
SURVEY  DATA 

The  loghncar  model  theory  described  in  the  preceding  section  was  developed  primari)} 
10  deal  with  the  analysis  of  multidimensional  cross-classified  tables  of  counts.  In  lhis 
section,  we  review  how  the  results  of  Section  2  can  be  applied  to  such  tables,  and  in 
the  course  of  doing  so  we  draw  conclusions  about  the  reporting  of  large  scale  national 
probability  samples  of  the  type  carried  by  government  agencies  and  others  around  the 
world. 


We  begin  with  a  simple  biomedical  example.  An  experiment  was  designed  to  study 
the  effects  of  two  analgesic  drugs  on  post-partum  pain  of  women  who  had  experienced 
normal  deliveries.  A  total  of  "IS  women  were  studied  and  they  were  assigned  to  one 
of  four  treatment  groups: 

A  B  -  0  dosage  of  drug  A  and  drug  B.  i.e.  placebo 

A  B'  -  100  mg.  of  drug  B 

A'B’  -  200  mg.  of  drug  A 

A  B*  -  200  mg.  of  drug  A  and  100  mg.  of  drug  B. 

The  outcome  variable  for  the  study  was  reduction  of  pain  lor  change): 

C  -  no  reduction 
C,  -  reduction. 

The  resulting  data  form  the  2\2x2  cross-classification  given  in  Table  3-1.  part  la). 
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TABLE  3-1 

The  Results  of  an  Experiment  Involving  Two  Analgesic 
Drugs  Intended  to  Reduce  Post-Partum  Pain 


(a!  observed 

counts;  (\  1 

Pain  Change 

Totals 

Level  of 

Level  of 

C 

C. 

Drug  B 

Drug  A 

A 

55 

115 

l'O 

B 

A. 

44 

132 

i~6 

A 

33 

154 

18" 

B. 

A. 

25 

160 

185 

Grand  Total 

'18 

(b)  estimated 

expected  counts. 

1m  1  under  model  (3 

it*. 

.2)  and  (3.3)  subject 

to  constraints 

(3.1). 

Level  of 

Level  of 

C 

C 

Totals 

Drug  B 

Drug  A 

A 

i 

54.- 

115.3 

no 

B 

1 

A, 

44.3 

131." 

176 

A 

33.3 

153.- 

18" 

B 


A 


24,' 


160.3 


185 


For  the  data  in  Table  3-1.  the  totals  for  AxB  are  fixed  by  design  (the  totals  differ 
somewhat  from  one  another  due  to  the  manner  in  which  the  stud}  was  conducted).  We 
are  interested  in  the  effects  of  drugs  A  and  B  on  the  response  variable  C.  Let 
\  =  no.  women  in  group  A  B  w  ho  respond  C  . 

Then  the  two-way  totals,  adding  over  k.  are  fixed,  i.e. 
m  =  x  i.j  —  1.2. 


(3.1) 

where  a  implies  summation  over  the  corresponding  subscript.  Expression  (3.1) 
corresponds  to  the  product-multinomial  constraints  (2.11). 


One  possible  model  for  the  data  of  Tabic  3-1  is 


m 

lOg  -  Ur  =  w  -  w  -  w 
m  1  : 

Mi 


(3.2) 


where 


(3.3) 

Model  (3.2)  is  referred  to  as  a  logit  model  and  n  postulates  the  addin' e  ejects  of 
drugs  A  and  B  on  the  logarithm  of  the  odds  of  pain  change  mm.  1  smg  Rcsuli  4 
of  Section  2.  we  can  also  represent  the  logit  model  ol  (3.2)  equivalently  as  a  ioghnear 
model  for  m  .  i.e. 

I  U 

log  m  =  u  -  u  u  ~  u  -  u  -  u  -  u 

ii.  i \ :  ;  i  ?■*.  : ;  m  : ;  ,, 

(3.4) 


with  the  usual  ANON' A  constraints  that  whenever  a  u-term  is  summed  oicr  a  subscripi 
the  sum  equals  zero.  e.g. 

Z:  u  =  I:  u  =0. 

t  I  i  t  ■*)  I’m 


(3.5) 

Since  (3.5)  is  subject  to  the  constraints  of  equation  (3.1).  u.  {u)  }.  {u,  ).  and  1 1.  .  i 
are  in  effect  fixed  by  design,  while 

w  =  2  u  w  =  2  u  .  and  w  =  2  u 

Ji  i  m  1  i.vii  ;-i  :>.ii 


The  minima!  sufficient  statistics  for  model  (3.2)  (or  (3.4)  subject  to  (3.1))  are  the 
three  sets  of  two-way  marginal  totals: 

fx  }  .  tx  I  .  (x  }  . 

i-  *ti 

(3.~> 


and.  using  Result  3.  the  likelihood  equations  arc: 
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m 

=  X 

1..I  = 

1.2. 

m 

=  X 

i.k  = 

1.2 

m 

=  X 

•  I) 

j.k  - 

1.2. 

(3.8) 

The  solution  to  the  likelihood  equations  does  not  have  a  closed-form  expression  and 
some  form  of  numerical  technique  is  required,  such  as  iterative  proportional  fitting  (c-.g. 
see  Andersen.  1980:  Bishop.  Fienberg.  and  Holland.  1*>~5:  or  Haberman.  1Q~4.  19~8). 

Part  (b)  of  Table  3-1  displays  the  MLE's.  {m  i.  for  our  example.  The  goodness-of- 
fit  statistics.  (2.14)  and  (2.15).  take  values 

X;  =  0.014.  G;  =  0.014. 

with  1  d.f.  Comparing  these  values  with  various  tail  values  of  the  distribution,  vve 
see  that  model  (3.2)  fits  the  data  extremely  well.  Thus  the  summary  of  the  2\2x2 
array  (x  )  in  terms  of  the  minimal  sufficient  statistics  (3.')  is  a  meaningful  one.  By 
reporting  only  the  two-way  marginal  totals,  we  provide  others  with  "sufficient 
information"  to  estimate  the  parameters  of  interest.  In  fact,  reduced  models  also  fit 
the  data  in  Table  3-1  extremely  well,  and  thus  wc  can  express  the  "sufficient 
information"  even  more  compactly. 

The  ideas  just  described  in  the  context  of  the  2x2x2  table  generalize  in  a 
straightforward  fashion  to  loglincar  models  for  tables  of  more  than  3  dimensions. 
Suppose  we  are  interested  in  reporting  the  results  of  a  national  simple  random  sample 
of  adults,  age  25  or  older,  conducted  to  provide  information  on  the  interrelationship 
between  educational  achievement  (variable  1  measured  in  terms  of  4  categories),  and 
occupational  satisfaction  (variable  2  with  3  categories),  and  how  it  varies  with  sex 
(variable  3  with  2  categories)  and  ethnic  origin  (variable  4  with,  say,  8  categories).  We 
have  a  single  multinomial  sample,  but  the  models  of  interest  are  ones  that  condition  on 
the  "background  variables."  sex  and  ethnic  origin.  Thus,  in  analyzing  the  resulting 
4x3x2x8  cross-classification,  wc  would  focus  on  models  conditional  on 


(3.9) 


An  example  of  a  loglinear  model  for  the  arrai  of  expected  cell  counts  (m  }  is 

t  Ik  ■ 
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log  m  =  u  -  u  ~  u  -  u  *  u 

c  uk-  14  :  1  )i  -i'" 

-  U  ’  U  -  U  *  U  ~  u 

1  3  jl.  I  -i'  I'  13'  1».  '•*  I-  •  * 

(3.10) 

This  model  postulates  simultaneous  interrelationships  between  each  of  the  two  "response" 
variables  (variables  1  and  2)  and  each  of  the  two  "explanatory"  variables  (variables  3 

and  4).  as  well  as  between  the  two  explanatory  variables  themselves.  This  model  docs 

not  include  any  of  the  four  terms  that  are  lnterprctable  as  second-order  interactions 

involving  3  variables,  nor  does  it  include  the  4-variab)e.  third-order  interaction. 
Models  containing  such  terms  might  be  of  interest  to  us.  however,  as  they  share  with 
(3.10)  several  desirable  features  from  the  viewpoint  of  reporting  of  survey  results. 

For  loglinear  models  of  the  sort  being  considered  here,  the  minimal  sufficient 
statistics  always  take  the  form  of  sets  of  marginal  totals.  In  our  particular  example, 
they  are  the  five  two-dimensional  marginal  tables  corresponding  to  the  fnc  two-factor 
terms  in  the  model:  the  marginal  tables  for  educational  achievement  by  sex.  h  ^  ). 

corresponding  to  (u  .  ):  educational  achiexcmcnt  by  ethnic  group.  ( x  I . 

corresponding  to  {u  }:  occupational  satisfaction  by  sex.  (x  ).  corresponding  to 

(u  k  occupational  satisfaction  bv  ethnic  group,  (x  1.  corresponding  to  <u,  }:  and 

.'J  l»  '  *  *  <* 

sex  by  ethnic  group,  (x  !.  corresponding  to  (u.  }.  If  we  were  to  report  only 

these  five  two-way  tables  ( along  with  a  description  of  ou>  mooel )  then  it  would  be 
possible  for  a  reader  with  appropriate  statistical  training  to  construct  a  four-dimensional 
table  sufficiently  close  to  the  observed  table  that  he  would  suffer  essentially  zero 

information  loss  (in  the  Fisherian  sense),  proxidcd  that  the  model  fits  the  data. 

The  implications  of  the  use  of  loglinear  models  for  the  analysis  and  reporting  of 

multidimensional  cross-classified  survey  data  are  thus  relatively  clear: 

(1)  By  the  use  of  model  building  we  are  often  led  to  particular  fo-ms  of  summary 
appropriate  for  our  data. 

(2)  In  the  cast  of  cross -classified  data  and  loglinear  models  this  summary  lakes  the 
form  of  certain  sets  of  marginal  totals,  specified  by  the  model. 

(3)  If  we  report  all  of  the  marginal  totals  appropriate  for  a  loglinear  model  that 
fits  the  data  well,  then  another  investigator  can.  in  effect,  reconstruct  the  data 
with  little  or  no  loss  in  information. 

Few  government  or  other  survey  organizations  adopt  such  a  model-based  approach  to 
analysis  and  reporting,  and  we  are  usually  left  to  ponder  the  relevance  of  tables  that 
are  reported. 
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The  approach  10  reporting  just  described  for  survey -based  cross-classh  icd  data 
assumed  that  we  are  dealing  with  either  a  simple  random  sample,  or  perhaps  with  a 
stratified  random  sample,  where  the  variables  underlying  the  strata  lit  thc>  haxe  an} 
intrinsic  interest)  are  included  amongst  the  explanatory  variables  in  the  loghncar  models. 
The  analysis  and  reporting  of  categorical  data  from  sample  designs  involving  clustering 
or  unequal  probabilities  of  selection  is  more  complex  (see  e.g..  Brier.  1980:  Fellcgi.  1980: 
and  Rao  and  Scott.  1981).  but  the  principles  behind  the  reporting  remain  the  same.  We 
should  noi  report  summaries  of  a  survey  involving  categorical  variables  only  in  a  form 
which  prevents  others  from  reconstructing  what  is  essentially  an  equivalent  version  of 
the  original  data  or  some  subset  thereof  (i.c.  summaries  thai  do  no;  include  an 

appropriate  set  of  minima!  sufficient  statistics).  This  is  the  type  of  practical  advice 

that  I  believe  Fisher  might  have  given  had  he  been  more  extensively  involved  in  the 

analysis  of  survey  data! 

4.  THE  USE  OF  LOG  LINEAR  MODELS  FOR  SOME  "NON-CONTINGENCV  TABLE 
PROBLEMS 

The  application  of  the  loglincar  model  results  from  Section  2  to  multidimensional 

contingency  tables  focussed  on  models  where  each  set  of  the  parameters  in  the 

logarithmic  scale  is  associated  with  one  or  more  dimensions  of  the  table.  One  of  the 

values  of  general  theoretical  results  is  that  they  arc  often  applicable  to  specific  settings 
beyond  those  which  led  to  the  formulation  of  the  general  structure.  This  is  certainly 
true  for  results  on  the  analysis  of  categorical  data  problems.  Fortunately  many  of  the 
"non-contingency  table"  applications  of  the  loghncar  model  results  have  contingency 
table-like  representations  so  that  we  can  interpret  the  results  of  our  analyses  using 
whatever  intuition  we  have  gleaned  from  the  analysis  of  contingency  table  data  using 

loglincar  models. 

4.1  THE  BRADLEY  -TERRY  PAIRED  COMPARISONS  MODEL 

To  illustrate  this  approach  let  us  consider  the  Bradley -Terry  model  for  binary  paired 
comparisons,  a  statistical  topic  which  has  been  studied  extensively  for  almost  three 
decades  (for  an  excellent  review  of  this  literature  see  Bradley.  1976).  Suppose  t  items 

(e.g..  different  types  of  chocolate  pudding)  or  treatments,  labeled  T  .  T. T.  are 

compared  in  pairs  by  sets  of  judges.  (Or  suppose  that  t  football  teams  compete  in 
pairs  in  a  series  of  matches.)  The  Bradley -Terry  model  postulates  that  the  probability 
of  T  being  preferred  to  T  is 
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Pr(T  T )  = 


i.j  =  1.2 . i. 


where  each  r  >  0  and  wc  add  the  constraint  that  Z'  r  -  1.  The  model  assumes 

■  it, 

independence  of  the  same  pair  by  different  judges  and  different  pairs  by  the  same 
judge.  In  the  example  of  the  football  matches  wc  assume  the  independence  of 
outcomes  of  the  matches. 

TABLE  4-1 

Layout  for  Data  in  Paired-Comparisons  Study  with  t  =  4 


Against 


T  T 


x  x 

i :  i- 


X  X 

4;  4 


In  the  typical  paired  comparison  experiment.  T  is  compared  with  T  n  £  0  times, 
and  we  let  x  be  the  observed  number  of  times  T  is  preferred  to  T  in  these  n 

t  I  : 

comparisons.  Tabic  4-1  shows  the  typical  layout  for  the  observed  data  when  t  =  4. 
with  preference  (for.  against)  defining  rows  and  columns.  Clearly  the  binomial 


constraint. 


x  *  x  =  n  . 


is  of  the  form  (2.9).  and  we  can  apply  Result  4  of  Section  2  to  convert  (4.1)  into  a 
model  for  expected  values  for  a  Poisson  sampling  setting,  i.c. 

log  m  =  c  *■  fi  *  y 


where 


with  suitable  side  constraints.  But  this,  as  was  noted  in  Fienbcrg  and  Larnt?  (1976).  is 
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simply  ihc  model  of  quasi-symmetry  in  a  square  contingency  table  (see  Bishop. 
Fienberg.  and  Holland.  1975.  Chapter  8).  The  minimal  sufficient  statistics  arc  (from 
Result  1) 

lx  }.  lx  }.  lx  -  x  1 

i-  \i  <.)  .M 

(4.5) 

(actually  cither  the  row  or  column  totals  arc  redundant),  and  we  can  use  a  trick, 
suggested  in  Bishop.  Fienberg.  and  Holland,  to  transform  the  problem  to  one  for  a 
three-way  table  of  expected  counts.  We  generate  duplicate  tables  and  set 

m  k  =  1  . 

1 1 

m  k  =  2  . 

(4.6) 

x  k  =  1  . 

i  ■ 

x  k  =  2  . 

r 

(4.7) 

Then  the  loglinear  version  of  the  Bradlcy-Tcrry  model  given  by  (4.3)  and  (4.4)  becomes 
the  model  of  no-sccond-ordcr  interaction  in  the  new  3-dimcnsiona!  table,  whose 
minimal  sufficient  statistics  arc  (lx  }.  (x  !.  1\  }).  Thus  we  can  analvze  the  fit  of 

If  l-k  M* 

the  model  and  variations  on  it  in  a  familiar  contingency  table  selling  of  the  sort 
described  in  Section  3. 

These  results  on  the  loglinear  representation  (or  the  Bradley -Terry  model  arc  by  now 
reasonably  well-known,  and  they  can  be  extended  to  more  complex  settings  involving 
lies,  multiple  comparisons,  and  rankings.  Recent  results  by  Meyer  (1981)  are  of  special 
use  in  given  contingency  table  representations  to  some  of  these  generalizations.  For  the 
remainder  of  this  section  we  describe  two  other  classes  of  categorical  data  problems 
where  loglinear  models  are  proving  to  be  useful,  and  for  which  standard  contingency 
table  representations  arc  especially  helpful  for  both  theoretical  and  computational 
reasons. 

4.2.  MODELS  FOR  SOCIAL  NETWORKS 

A  directed  graph  consists  of  a  set  of  g  nodes,  and  a  collection  of  directed  arcs 
connecting  pairs  of  nodes.  Such  graphs  have  been  used  to  depict  social  networks 
describing  relationships  between  pairs  of  individual  actors.  Figure  4-1  contains  an 
example  of  such  a  graph  for  the  relationship  "social  friendship.”  for  12  5th  grade  boys. 


m 


and.  for  the  observed  counts. 


lb 


Each  box  was  asked  to  name  the  two  boys  with  whom  he  was  the  friendliest  outside 
the  classroom.  Table  u-2  summarizes  the  information  from  the  directed  graph  of 
Figure  4-1  in  the  form  of  a  1 2.\  1 2  socicmatri x  or  adjacency  matrix.  \.  with  elements 


1  if  i  chooses  j  as  his  friend 

x  = 

0  otherwise. 


where  b>  convention,  the  diagonal  terms  x  =  0 


<4.8> 


Holland  and  Lcmhardt  (1981)  note  that  for  am  pair  or  ci\ad  m  a  network,  with 
adiaccncx  maim  x. 

x  x  *  x  ( 1  — x  )  *  ( 1 — x  )  x  -  (1-x  )( 1— x  )  =  1  . 

i  1.  f  1-  'I  '  i 

(4.9) 

for  i  -  j.  and  that  exaeth  one  of  the  terms  on  the  left  hand  side  of  (4.91  is  l  and  the 
remaining  three  arc  0.  They  then  suggest  the  following  model  to  describe  these 
outcomes  (using  X  as  the  matrix  of  random  variables  of  which  the  adjacency  matrix  x 
is  a  realization): 


log  PrUl-X  )(1~X  )  =  1] 

i  n 

log  Pr [  ( 1 — X  )X  =  1] 

1 1  »• 

log  Pr [X  (1-X  )  =  1] 
log  Pr[X  X  -  1 3 

■  i  1i 


=  X 

—  X 

-  X  -  c  ■*  ft  *  V 

-X  -  c  •+  c  +[)  *  ft  ~  "  f' 


<4. 10) 

where  the  {v  }  are  ’’dyadic"  effects  included  here  (but  only  implicitly  in  Holland  and 

i 

Lcmhardt)  to  assure  that  the  multinomial  constraint  (4.9)  is  satisfied,  and  where 

X  c  =  X'  b  -  0  . 

>  f  i  t 

(4.11) 

There  arc  too  man>  parameters  in  this  model  for  complete  identification,  and  so 
Holland  and  Leinhardi  set 


Thex  refer  to  the  resulting  model  as  p  . 


(4.12) 
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TABLE  4-2 

Sociomatrix  for  Social  Friendship  Among  12  5ih  Grade  Boys 


A 

B  C  D  E  F 

G  H  I 

J 

K  L 

a: 

1  1  1 

•  1  1 

1 

1 

1  1 

b: 

1  1 

1  1 

1 

1 

1  1 

1 

c: 

i  _  — 

1  1 

1  1 

1  1 

1 

1 

1 

1 

i 

d: 

t  1 

1  1 

1 

1 

1 

1  1 

FI 

1  1  1 

1 

i 

1 

1 

1 

G! 

t  , 

i  i  : 

1 

1 

1 

1 

< 

11 

i  i  i 

1 

1 

1 

■ 

! 

J1 

i  i  i 

1  1 

i 

i 

K 1 

I  i  i 

1  1 

i 

i 

i 

1 

LI 

1 

1  1  1 

i  1  \ 

1 

1 

i 

Ml 

1  i 

I  i 

1  1 

i 

N 1  1 

1 

i  i 

i  i 

1 

1 

i 

i 

FIGURE  4-1 

Sociogram  or  Directed  Graph  Representing  Data  in  Sociomatrix  of  Table  4-2 
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If  wc  assume  that  the  dyads  are  independent,  then  we  have  a  product-multinomial 
sampling  mode!  with  one  observation  per  multinomial.  (This  model  doesn’t  yet  lake 
into  account  the  extra  constraints  in  the  data  of  Table  4-2  where  the  row  sums  of  x 
are  all  restricted  to  equal  2).  Holland  and  Lcinhardt  make  direct  use  of  the 
exponential  family  theory  results  on  maximum  likelihood  estimation  (c.f.  Section  2)  to 
estimate  the  parameters  in  p  .  Fienbcrg  and  Wasscrman  (1981a.  1981b)  note,  however, 
that  there  is  a  direct  link  between  the  p  model  and  a  ioglincar  model  for  a  multi¬ 
dimensional  table  representation  of  the  probabilities  in  (4.10).  In  particular,  they  work 
with  the  four-dimensional  array: 


X 

i.n : 

X  X 

1  ’  It 

X 

me 

X  (1-X  ) 

l  1 

X 

(1-X  )X 

IK>! 

i :  it 

X 

(1-X  Kl-X  1. 

i|Ov 

t  It 

(4.13) 

Note  that 

X 

X 

ilk* 

(M 

(4.14) 

because  the  dyad  (i,j)  is  the  same  as 

the  dyad  (j.i).  Thus. 

if  lx 

)  is  a  realization  of 

{ X  ^  )  we  only  need  to  consider  one  "triangle"  of  (x 

)  in 

which  i  >  j.  But  by 

retaining  all  4g  cells  in  the  g\g\2x2 

table  we  arc  able  to  express 

the  minimal  sufficient 

statistics  for  the  parameters  of  p  as 

marginal  totals  of  lx 

Ilk 

>: 

x  =  X  \.\ 

•  *  1  1  1C  1  1  *  it 

X  -  X 

t  =  1.2 . g. 

x’  -  x’ 

i  =  1.2 . g. 

X  =  xJ  . 

(4.15) 

Finally,  by  coupling  (4.15)  with  (4.9) 

and  (4.14).  and  then 

reexpressing.  wc  can  get  an 

alternative  set  of  sufficient  statistics: 

(x  }.  (x  J.  (x  1 

.  (x  j.  (x  ,  ). 

fx 

}  . 

l  l  *  •  i  *  k  •  4 .1  •  i  •  • '  *  |k  *  *  •  k 


(4.16) 

(allowing  for  redundances  resulting  front  symmetries  and  duplications).  But  (4.16)  and 
the  set  of  six  two-dimensional  marginal  totals  of  the  four-dimensional  array.  and  it  can 
be  shown  (Mcver.  1981)  that  fitting  p  to  x  =  (x  }  is  equivalent  to  filling  the  no- 

*  U 

second-order  interaction  model  to  the  newlv  created  redundant  arra\  (x  ). 

nk' 


19 


This  standard  contingency  table  representation  for  Holland  and  Lemhardt's  p  mode) 
leads  to  superior  numerical  solutions  to  the  likelihood  equations.  It  also  leads  naturally 
to  a  generalization  of  p  where 

i  >  J- 

(4.17) 

Fitting  this  model  to  (x  }  is  equivalent  to  fitting  the  standard  loglinear  model  to  (\  1 

1.1  1  Ik' 

with  minimal  sufficient  statistics 

(x  }.  (x  ).  {x  }  . 

I  *  k-  ‘IK- 

(4.18) 

We  now  return  to  the  data  in  Table  4-2  on  social  friendships  amongst  12  grade  5 
boys,  and  recall  that  the  row  totals  were  fixed  to  equal  2.  by  design.  This  leads  to  a 
relatively  complex  hypergeometric  sampling  scheme,  but  we  can  approximate  results  for 
it  by  using  the  methods  for  pi  just  described  and  then  focus  only  on  the  parameters 
{/H  and  Our  analysis  of  the  data  in  Table  4-2  is  relatively  straightforward. 

Measuring  the  fit  of  Holland  and  Lcinhardts  p^  model  using  the  likelihood  ratio 
criterion  of  expression  (2.15).  tve  get  G:  =  104.15  with  98  d.f.  (The  general  formula  for 

*  i 

d.f.  is  g(g-l)  and  g  =  12.  but  we  need  to  adjust  here  for  the  zero  marginal  total  in  the 

bth  column.)  Next  we  fit  the  "differential  reciprocity”  model.  (4.17).  whose  fitted  is 

summarized  by  G:  -  92.84  with  87  d.f.  (the  d.f.  calculation  here  is  quite  problematic. 

but  the  results  do  not  depend  on  a  precise  calculation).  Thus  we  can  check  on  the  fit 

of  p(  to  the  data  in  Table  4-2  by  taking 

AG;  =  G:  -  G;  =  11.31 

r,  J- 

with  "approximately"  11  d.f.  The  p,  model  fits  reasonably  well.  The  boys  who  attract 
the  most  friendship  (e.g.  boys  2.  3.  9.  10.  and  11)  do  not  appear  to  reciprocate  in  a 
differential  manner  from  those  who  attract  little  friendship,  given  that  we  adjust  for 
their  differing  levels  of  attractiveness. 

What  is  especially  attractive  about  the  multi-dimensional  contingency  table 
representation  of  the  social  network  data  problem  as  outlined  here  is  that  it  carries 
over  to  networks  involving  multiple  relationships.  For  details,  see  Fienbcrg.  Meyer,  and 
Wasserman  (1981),  Yet  this  type  of  representation  is  not  a  panacea.  The  sparseness  of 
the  array  {x  )  makes  the  application  of  the  usual  asymptotics,  and  in  particular  Result 
5  of  Section  2.  problematic  at  best.  The  array  fx  }  is  of  size  4g;  but  x  =  2g(g- 
1).  and  the  p(  model  has  2g  parameters.  For  a  more  detailed  discussion  of  the  relevant 
asymptotics  for  this  problem  see  Fienberg  and  Wasserman  (1981a)  and  Haberman  (1981). 


r 
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4.3  THE  RASCH  MODEL 

We  now  turn  lo  yet  another  problem  which  begins  with  a  representation  as  a  two- 
way  table  of  0's  and  l's.  and  ends  up  as  a  relatively  standard  multi-dimensional 
contingency  table  problem.  The  results  of  ability  tests  arc  often  structured  in  the 
form  of  sequences  of  l's  for  correct  answers  and  0's  for  incorrect  answers.  For  a  test 
with  k  problems  or  items  administered  to  n  individuals,  we  let 


Y  = 


1  if  individual  i  answers  item  j  correctly 

0  otherwise. 


(4.19) 

Thus  we  have  a  two-wav  table  of  random  variables  {V  1  with  realizations  ly  }.  An 
alternative  representation  of  the  data  is  in  the  form  of  a  n.\2‘  table  )W  )  where 

the  subscript  i  still  indexes  individuals  and  now  j  ,j_ . refer  to  the  correctness  of  the 

responses  on  items  1.2 . k.  respectively,  i.e. 


W’ 


1  if  i  responds  Lj^j . j  ) 

0  otherwise. 


(4.20) 


The  Rasch  mode!  (Rasch.  I960  as  reprtnied  in  1980:  Bimbaum.  195’)  for  the  (V  1  is 


P(V  =1) 

log  — -  l 
PlY  =0)  * 


(4.21) 


where 


E.  fi  —  £ '  —  0  . 

(4.22) 

Differences  of  the  form  ^  -  p  arc  typically  described  as  measuring  the  relative 

abilities  of  individuals  i  and  r.  while  those  of  the  form  >  -  >  arc  described  as 

1 

measuring  the  relative  difficulties  of  items  .)  and  s.  Expression  (4.21)  is  a  logit  model 
in  the  usual  contingency  table  sense  for  a  3-dimensional  array  whose  first  layer  is  ly  ) 
and  whose  marginal  totals  adding  across  layers  is  an  nxk  table  of  l's.  Because  the 
Rasch  mode!  depends  on  the  item  parameters  in  a  non-linear  way.  it  is  not  at  all  clear 
whether  we  can  collapse  the  arrav  (w  f  bv  adding  over  subjects  for  estimation 

purposes.  We  return  to  this  matter  below.' 
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Duncan  ( 1*182)  has  proposed  that  we  should  view  certain  types  of  survey  data  in  much 
the  same  way  as  we  do  ability  test  data.  For  example,  he  describes  a  4-nem  scale 
included  in  a  survey  pertaining  to  beliefs  about  effects  of  marijuana.  If  we  can 
consider  these  items  in  isolation  from  the  rest  of  the  survey  questions  (see  the 
discussion  of  this  in  Section  3  on  reporting),  then  we  can  display  the  relevant  data  as 
an  n\4  array  of  the  form  (4.19).  and  we  can  explore  the  appropriateness  of  the  Rasch 
model  as  a  description  of  the  observed  data.  In  the  context  of  Duncan  s  examples  the 
individual  parameters.  1  //  ).  can  be  thought  of  as  values  for  a  "latent  trait"  of  the 
survey  respondents  in  much  the  same  way  as  psychomctricians  tune  intepreted  these 
parameters  as  measuring  the  single  latent  trait,  ability.  Duncan  discusses  the  matter,  not 
considered  here,  of  structuring  the  u 's  according  to  multiple  dimensions,  and  he  links 
the  notion  of  background  variables  and  stratification  to  differing  latent  trait  structures. 

Maximum  likelihood  estimation  for  the  parameters  of  the  Rasch  model  (4.21)  has 
been  the  focus  of  several  authors  including  Rasch  and  Andersen.  Unconditional 
maximum  likelihood  (UML)  estimates  can  be  derived  but  they  have  rather  problematic 
asvmpiotie  properties,  c.g.  the  estimates  are  inconsistent  as  n  -»  oc  and  k  remains 
moderate,  although  they  arc  consistent  when  both  n  and  k  -*  oc  (Haberman,  19"). 


Before  turning  to  an  alternative  to  the  UML  approach,  we  point  out  a  recently- 
derived  result  for  UML  estimates  for  the  Rasch  model  which  links  up  in  yet  another 
way  with  loglinear  structures  for  contingency  tables.  In  order  to  derive  necessary  and 
sufficient  conduons  for  the  existence  of  UML  estimates  (a  problem  not  really  discussed 
tor  any  ot  the  data  structures  in  this  paper).  Fischer  (1981)  embeds  the  matrix  y  = 
ly  )  into  a  larger  (n»k)x(n*k)  matrix  of  the  form: 

0  e-y 

A  =  (a  )  = 

"  y  o 


(4.23) 

where  e  ts  an  nxk  matrix  of  I  s.  so  that,  for  all  (t.j). 

a  =  a  =  1  . 

1 1  I 

(4.24) 


Then  he  notes  that  the  Rasch  model  of  (4.21)  ts  transformed  into  an  incomplete  version 
of  the  Bradley -Terry  model  of  expression  (4.1)  discussed  at  the  beginning  of  this 


section,  i.c. 


r 

Pla  =1)  =  - — 

I  ; 

r  -  r 


i  =  k-1 . k+n. 

J  =  1.2 . k. 


and  similarly  for  the  other  non-zero  block  of  entries  in  A.  where 

r  i.r  =  1.2 . n. 

log  - L  =  r1  -  A, 

v  i  *  r. 


and 


(4.25) 


(4.2b) 


^  j.s  =  1.2... .k. 

Jog  - -  =i-i 

'  J  *  S  . 

(4.2'') 

Thus,  using  a  three-dimensional  representation  for  A  alluded  to  at  the  beginning  of  this 
section,  we  can  show  that  estimation  results  for  the  UML  approach  to  the  Rasch  model 
correspond  to  those  of  for  the  no-second-order  interaction  model  applied  to  an 
incomplete  three-dimensional  contingency  consisting  of  two  zero  blocks  of  dimension 
kxkx2  and  nxnx2.  and  a  duplicated  version  of  the  nxkx2  table  with  layers  y.  and  c  -  y. 


Now.  we  turn  to  a  conditional  approach  to  likelihood  estimation  (CML)  advocated 

initially  by  Rasch.  who  noted  that  the  conditional  distribution  of  Y  given  the  individual 
marginal  totals  ty  =  y  1  depends  only  on  the  item  parameters.  {'  }.  Then  each  of 

the  row  sums  {v  ]  can  take  only  k-1  distinct  values  corresponds  to  the  number  of 

»  • 

correct  responses.  Next,  we  recall  the  alternate  representation  of  the  data  in  the  form 
of  an  nx2l  array.  1\V  ).  as  given  by  expression  (4.20).  Adding  across  individuals 

we  create  a  2*  contingency  uiblc.  X.  with  entries 

X  =  W  . 

V:  ‘  ''  (4.28) 

Earlier,  we  asked  the  question  of  whether  we  could  work  with  this  collapsed  array. 

The  answer  is  yes.  since  all  of  the  information  we  need  to  preserve  is  the  response 

pattern,  i.e.  {j,,j. . j(}.  and  the  number  of  "correct''  responses  that  correspond  to  that 

pattern.  Such  information  allows  us  to  completely  reconstruct  the  original  matrix  of 
responses.  Y.  except  for  the  labelling  of  individuals,  and  thus  we  can  use  the  2k  array 
x  to  represent  the  conditional  distribution  of  X  given  {Y  =y  ). 


Duncan  (1*582)  and  Tjur  (1*581)  independently  noted  that  we  can  estimate  the  item 
parameters  for  the  Rasch  model  of  (4.21)  using  the  2'  array  x.  and  the  loglincar  model 
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log  m  =  «•  *  Z’  v  1  *  ) 

V:  J  i  ' 

14.29) 

where  Ihe  subscript  j  =  52  ;  j .  o  =  1  if  j  =  1  and  is  0  otherwise,  and 

22  v  =  0  . 

I’-':  •  , 

(4.30) 

The  amazing  result,  due  to  Tjur  (1981).  is  that  maximum  likelihood  estimation  of  the 
2‘  contingency  table  of  expected  values,  nt  =  (m  }  using  a  Poisson  sampling 
scheme  and  the  loglinear  model  (4.29).  produces  the  ’conditional  maximum  likelihood 
estimates  of  {>  1  for  the  original  Rasch  model.  Tjur  proxes  this  equivalence  by  (II 
assuming  that  the  individual  parameters  arc  independent  identically  distributed  random 
variables  from  some  complete!'  unknown  distribution.  - ;  (2)  integrating  the  conditional 
distribution  of  Y  given  (Y  =y  }  oxer  the  mixing  distribution,  r;  (3)  embedding  this 
"random  effects"  model  in  an  "extended  random  model":  and  (4)  noting  that  the 
likelihood  for  the  extended  model  ts  equivalent  to  that  for  (4.29)  applied  to  x  (using 
Result  4  of  Section  2  above). 

TABLE  4-3 

Multiplicative  Representation  of  Expected  Values  of  Model  (4.29)  for  the  Case  k  =  3 


Item  C 


Yes 

No 

Item 

A 

Item  A 

Yes 

No 

Yes  No 

Yes 

abcS. 

abS_ 

bcS.  bS 

No 

acS^ 

aS 

1 

cS  S 

For  k=3.  the  loglinear  version  of  the  Rasch  model  for  the  2'  table,  i.c.  (4.29).  can  be 
represented  in  multiplicative  form  for  the  expected  values  m  as  in  Table  4-3.  The 
minimal  sufficient  statistics  arc 

(x  ).  lx  }.  (x  }  . 

,  .  -  i-  •  *  k 

(4.31) 
and 

(x.x  +x  +x.x  x  +x.x). 

m  liu  mi  oi  i  i  (x  oio  oo  i  ooi 

(4.32) 

But  these  are  the  minimal  sufficient  statistics  of  the  model  of  quasi-symmetry 
preserving  one-dimensional  marginal  totals  which  was  first  proposed  by  Bishop. 
Fienbcrg.  and  Holland  (1975.  Chapter  8).  Indeed,  that  mode)  is  equivalent  to  (4.29). 


Thus  following  the  prescription  of  Bishop.  Ficnberg.  and  Holland  ( 1  c*“’5 .  p. 305).  we  can 
re-represen!  the  data  in  a  4-dimcnsiona!  redundant  form  tas  a  2x2x2\b  tabic)  and 
estimate  the  Rasch  model  item  parameters  using  a  standard  loglinear  mode!  fitted  to  a 
4-way  table  (although  noi  the  4 -way  table  w  of  expression  (4.20)).  Additional 
simplifications  ensue  here  because 

m  =  x 

U:  :  r. 

m  =  x 

Ihh  i-.K 

(4.23) 

Plackett  (19S 1).  in  a  very  brief  section  of  the  2nd  edition  o!  his  monograph  on 
categorical  data  analysis,  notes  that  the  Q-statistic  of  Cochran  (1950)  can  be  viewed  as  a 
means  of  testing  that  the  item  parameters  in  the  Rasch  model  are  all  equal  and  thus 
zero.  i.c.  1  =  0  for  all  j.  This  observation  is  intimately  related  to  the  results  just 

described,  and  our  original  data  representation  in  the  lorm  of  an  nxk  (individual  by 
item)  array  y  is  exactly  the  same  representation  used  by  Cochran.  Bv  carping  ou:  a 
conditional  test  for  the  equality  of  marginal  proportions  given  mode:  (4.29)  i.e.  quasi- 
symmetry  preserving  one-dimensional  marginals,  we  get  a  test  that  is  essentially 
equivalent  to  Cochran's  test.  But  this  is  also  the  test  for  i  >  =  0}  within  model  (4.29). 

Duncan  (1982)  gives  several  examples  of  the  application  of  the  Rasch  mode!  to  survey 
research  problems,  and  he  presents  several  extensions  of  the  model,  indicating  how  they 
can  be  represented  in  a  multi-dimensional  table  formal  such  as  that  of  Table  4-3. 

5.  COMPUTATION  FOR  LOGLINEAR  MODEL  METHODS 

As  we  noted  in  Section  3  on  multi-dimensional  contingency  tables,  we  do  not 
necessarily  gel  closed-form  estimates  of  the  MLE's  ni  of  the  expected  counts.  Thus 
some  form  of  iterative  numerical  procedure  is  often  required.  The  most  popular 
numerical  procedure  for  calculating  MLE's  is  the  method  of  iterative  proportional 
fitting  (IPFP):  which  iteratively  adjusts  the  entries  of  a  contingency  table  to  have 
marginal  totals  specified  by  the  likelihood  equations. 

To  illustrate  the  algorithm  we  consider  a  three-way  table  of  independent  Poisson 
counts,  x  =  lx  }.  Suppose  we  wish  to  fit  the  loglinear  model  of  no-sccond-order 

III 

interaction  for  the  mean  m.  i.e.  the  model  given  by  expression  (3.4).  The  basic  IPFP 
takes  an  initial  table  m°.  such  that  log  <nv'"l  satisfies  the  model  (typically  we  would 
use  m'”  =  1  for  all  i.j.  and  k)  and  sequentially  scales  the  current  fitted  table  to 
satisfy  the  three  sets  of  the  two-way  margins  of  the  observed  table,  x.  The  >  th 
iteration  consists  of  three  steps  which  form: 
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m 

=  m 

.  X 

m 

m  '  : 

» 

=  m 

.  X 

.-  m 

m  '  ! 

j  ' 

=  m 

X 

.  m 

(5.1) 

(The  firsi  superscript  refers  10  tile  iteration  number,  and  the  second  10  the  siep  number 
within  iterations).  The  algorithm  continues  umi)  the  observed  and  fitted  margins  are 
sufficient!)  close.  For  a  deiailcd  discussion  of  convergence  and  some  of  the  oiher 
properties  of  the  algorithm,  sec  Bishop.  Fienberg  and  Holland  <  or  Haberman 
(19-4). 

Common  alternatives  to  the  IPFP  arc  versions  of  Newton's  method  or  other 
algorithms  which  use  information  about  the  second  derivatives  of  the  likelihood 
function.  While  such  methods  have  quadratic  convergence  properties  compared  tc  the 
linear  properties  of  the  IPFP,  and  are  often  quite  efficient  (see  e.g.  Haberman  (19~4). 
or  Fienberg.  Meyer  and  Stewart  ( io~P)>.  they  are  of  limited  use  tor  models  of  high 
dimensionality  because  of  storage  requirements.  Newton's  method  also  automatically 
produces  an  estimate  of  the  variance-covariance  matrix  of  the  parameters,  but  this  is 
what  requires  all  of  the  storage  space.  Currently,  the  most  widely -used  computer 
program  that  employs  a  Newton-like  algorithm  is  GLIM,  which  is  distributed  by  the 
Numerical  Algorithms  Group  of  the  United  Kingdom  (Baker  and  Nclder.  !9~8). 

Recent  research  on  numerical  procedures  for  maximum  likelihood  estimation  in 
loglinear  models  has  focussed  on  alternative  algorithms  that  will  handle  the  types  of 
large  data  arrays  that  arise  in  practical  problems.  For  example.  Fienberg.  Meyer,  and 
Wasscrman  (1981)  describe  an  application  of  the  social  network  methodology  of  Section 
4.3  in  which  the  basic  data  consisi  of  three  correlated  "3\73  adjacency  matrices.  Wc 
briefly  outline  three  different  approaches  that  have  been  proposed  to  handle  large  data 
arrays. 

One  approach  to  increasing  the  storage  capacity  of  current  problems  is  found  in  work 
in  progress  by  Fienberg.  Meyer,  and  Stewart  (1981).  who  have  been  developing  programs 
for  both  loglinear  and  logit  models  using  a  variant  of  Newton's  method.  Their 
algorithms  involve  the  construction  of  the  upper  half  of  a  pxp  weighted  cross-product 
matrix  where  p  is  the  dimension  of  the  parameter  vector  H.  and  take  full  advantage  of 
the  sparseness  of  the  nxp  design  matrix  without  actually  constructing  it.  The  algorithms 
proceed  via  Newton's  method  with  variable  step  length,  using  a  Cholesky  decomposition 
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with  pivoting.  A  special  feature  of  these  algorithms  is  a  subroutine  that  checks  for  the 
existence  of  MLE's.  m.  by  performing  a  pivoted  Cholcsky  decomposition  on  a 
substantially  reduced  problem.  I:  should  be  possible  to  use  these-  algorithms,  when  they 
become  available,  as  replacements  for  the  Newton-like  algorithms  in  programs  such  as 
GLIM. 

McIntosh  (19S1)  has  proposed  the  use  of  yet  another  alternative  to  IPFP.  the  method 
ot  conjugate  gradients.  Unlike  Newton's  method  which  uses  the  full  matrix  of  second 
derivatives  of  the  likelihood  function,  the  method  of  conjugate  gradients  works  by 
carrying  out  ar.  "optimal''  sequence  of  one- dimensional  maximizations.  The  method  of 
conjugate  gradients  has  storage  requirements  similar  to  that  of  IPFP.  but  has 
"superhnear"  convergence  properties.  McIntosh  (1%1)  provides  numerical  comparisons 
of  different  algorithms  for  several  contingency  table  examples  bu:  these  tail  to 
demonstrate  the  areas  of  superiority  of  the  current  versions  of  his  conjugate  gradient 
algorithms,  which  have  been  implemented  within  GLIM. 

Finally,  we  note  the  recent  work  of  Meyer  (1981).  who  considers  generalizations  of 
IPFP  due  to  both  Haberman  (19~5)  and  Csiszar  (1Q~5).  Meyer  has  developed  a  new 
method  for  estimating  MLE's  that  is  especially  attractive  for  large  problems  and  which 
combines  the  advantages  of  both  Ncwtor.'s  method  and  IPFP.  Basicallv.  his  approach  is 
to  break  the  large  problem  into  manageable  but  overlapping  subproblcms.  Then  he 
iterates  m  an  IPFP— like  manner  amongst  the  subproblcms.  for  each  of  winch  he  uses 
Newton's  method. 

All  of  the  computational  approaches  just  discussed  arc  currently  under  active 
development.  We  expect  that  these  and  other  efforts  will  ultimately  expand  the  scope 
and  size  of  categorical  data  problems  that  can  be  analyzed  using  loglinear  moacl 
methods. 

b.  CONCLUDING  REMARKS 

In  this  lecture  1  have  examined  a  variety  of  categorical  data  problems  using  models 
that  arc  linear  in  the  logarithms  of  the  expected  cell  values.  The  methods  and  models 
arc  linked  to  a  small  core  of  theoretical  statistical  results  depending  on  exponential 
family  theory,  and  the  concepts  of  minimal  sufficient  statistics  and  maximum  likelihood 
estimation.  All  of  these  results  have  as  their  foundation  research  work  of  Sir  R.A. 
Fisher. 
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The  building  of  bridges  from  statistical  theory  to  statistical  practice  is  an  activity 
which  Fisher  thought  to  be  especially  appropriate  for  IS1  Meetings  1  hope  that  many 
of  you  will  ha\e  crossed  such  a  bridge  with  me  today,  and  in  the  process  gained  an 

appreciation  for  the  richness  of  the  theoretical  results  on  loghnear  models  for 

categorical  data  analysis,  and  the  many  different  practical  areas  to  which  they  may  be 
applied. 
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SUMMARY 

Tiie  past  20  years  hast  seen  an  enormous  growth  tr.  the  statistical  literature  or.  the 
analysts  of  categorical  data,  much  of  it  based  on  the  use  of  loghnear  models.  This 
paper  reviews  some  o;  the  general  results  on  maximum  likelihood  estimation  for 
loghnear  models  anc  links  them  back  to  ideas  that  have  their  foundations  in  the  work 
of  Sir  R.A.  Fisher.  Tnesc  results  have  special  relevance  for  the  analysis  of 
multidimensional  contingency  tables,  and  for  tne  reporting  of  data  from,  large-scale 
sample  surveys.  lr.  addition,  the  results  are  applicable  to  other  categorical  data 
problems  that  are  often  representable  in  contingency  tabic  form.  The  paper  concludes 
with  a  brief  description  of  the  state  of  the  art  of  computation  for  loghnear  model 
methods. 

RESUME 

Les  vingt  annees  precedents  ont  assisted  a'  unc  croissance  considerable  de  la  httcraiure 
statisque  traitam  lanalyse  des  tables  de  contmgence.  souvent  en  utilisant  des  modeles 
log-hneaires.  Cei  article  passe  en  revue  qucloues  resultats  generaux  sur  Fesumation 
maximum  de  vraisemblance  pour  les  modeles  log-hneaires.  et  les  relie  a  des  idees 
provenant  de  Y oeuvre  de  Sir  R.A.  Fisher.  Ces  resultats  ont  un  rapport  parucuher  a 
lanalyse  des  tables  de  contingences  multidimensionelles.  et  au  reportage  des  donnees 
d  enqueies  etndues  En  plus,  ces  resultats  peuvent  servir  a  l  analyse  d  autres  donnees 
categoriques  qu:  permmem  une  presemation  tabulairc.  L  article  sc  condui  avec  une 
courte  description  des  mc'thodes  numeriques  utihsees  present  pour  lanalyse  des 
modeles  log-lincaires. 
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